Data set used for analysis:
## /// GENIND OBJECT /////////
##
## // 367 individuals; 5,554 loci; 18,606 alleles; size: 30.7 Mb
##
## // Basic content
## @tab: 367 x 18606 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 2-25)
## @loc.fac: locus factor for the 18606 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: read.genepop(file = "data/POPGEN/BMA_by_pop_genepop.gen", ncode = 3L,
## quiet = FALSE)
##
## // Optional content
## @pop: population of each individual (group size range: 20-52)
## @strata: a data frame with 10 columns ( LIB_ID, PLATE, WELL, SAMPLE_ID, LAT, LONG, ... )
Samples were collected in or near estuaries throughout the Gulf of Mexico and southwest Atlantic, individual geographic coordinates were not available for all samples. Individuals are grouped by putative populations based on sample location. Estuaries are grouped by geographic regions (Northern, Central, Western, Southern Gulf, and Southwest Atlantic). Geographic regions are categorized as being located in the Gulf and Atlantic ocean basins, though it should be noted that the Atlantic consists of a single sample.
## OGR data source with driver: ESRI Shapefile
## Source: "/home/soleary/GAFFTOPS/BMA_POPGEN/data/BASEMAPS", layer: "ne_10m_land"
## with 1 features
## It has 2 fields
## OGR data source with driver: ESRI Shapefile
## Source: "/home/soleary/GAFFTOPS/BMA_POPGEN/data/BASEMAPS", layer: "ne_10m_admin_1_states_provinces_lines"
## with 10118 features
## It has 21 fields
Sample distribution of gafftopsail catfish throughout the Gulf of Mexico and Atlantic Ocean
Sample distribution by putative population, geographic region, and ocean basin.
Number of sample per putative population, Florida Atlantic (FLA), South and North Florida (FLGS, FLGN), Alabama (MB), Mississippi (MISS), Louisiana (CS, LA), Texas (CC), and Campeche (SG).
| POP | n |
|---|---|
| FLA | 41 |
| FLGS | 51 |
| FLGN | 41 |
| MB | 52 |
| MISS | 37 |
| CS | 20 |
| LA | 48 |
| CC | 50 |
| CAMP | 27 |
Number of samples per geographic region, Southwest Atlantic (Florida), Eastern Gulf (North, South Florida), Central Gulf (Alabama, Mississippi, Louisiana), Western Gulf (Texas), Southern Gulf (Campeche).
| REGION | n |
|---|---|
| SWATL | 41 |
| EGULF | 92 |
| CGULF | 157 |
| WGULF | 50 |
| SGULF | 27 |
Number of samples per ocean basin.
| OCEAN | n |
|---|---|
| ATL | 41 |
| GULF | 326 |
Homogeneity in allele and genotype distributions among samples tested using a single-level, locus-by-locus analysis of molecular variance (AMOVA) as implemented in Arlequin.
Perform PCA; missing values in allele count matrix replaced with mean values (across all individuals).
Percent variance explained by first 25 principle components
Top three principle components retained, which explain a cumulative 2.53% variance.
Distribution of individuals on first three principle components
Run BayeScan for individuals grouped by sample location (putative populations)
BayeScan (Foll and Gaggiotti 2008Foll, Matthieu, and Oscar Gaggiotti. 2008. “A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective.” Genetics 180 (2): 977–93. https://doi.org/10.1534/genetics.108.092221.; Foll et al. 2010Foll, Matthieu, Martin C. Fischer, Gerald Heckel, and Laurent Excoffier. 2010. “Estimating population structure from AFLP amplification intensity.” Molecular Ecology 19 (21): 4638–47. https://doi.org/10.1111/j.1365-294X.2010.04820.x.) uses differences in allele frequencies between populations to identify Fst-outlier loci. The null model (distribution of Fst for neutral loci) is generated based on the multinomial-Dirichlet model (island model) which assumes allele frequencies of subpopulations are correlated through a common migrant gene pool from which they differ to varying degrees; this difference is measured by a subpopulation-specific Fst coefficient. This model can incorporate ecologically more realistic scenarios compared to a strict island model and is a robust method even when there are differing effective sizes and migration rates per subpopulation.
Selection is introduced by decomposing population-specific Fst coefficients into population-specific component (beta), shared by all loci, and locus-specific component (alpha), shared by all poulations using logistic regresstion. If locus-specific component is necessary to explain observed pattern of diversity (i.e. alpha significantly different from 0), departure from neutrality is inferred.
BayeScan calculates a posterior probability for model including selection using bayes factors. Multiple testing is needed to incorporate identifying loci as under selection by chance. The posterior odds (instead of bayes factors) are calculated to make the decision on whether loci should be considered outlier. Posterior odds are calculated as the ratio of posterior probabilities indicating how much more likely the model with selection is compared to netural model. Posterior probabilities can be used to directly control FDR (expected proportion of false positives amont outlier loci). q-value of each locus indicates the minimum FDR at which this locus may become significant, e.g. q-value > 0.05 means that 5% of corresponding outlier markers are expected to be false positives (5% threshold more stringent than corresponding p-value in classic statistics).
Use PGDSpider to convert genepop file to bayescan format.
# input file with individuals grouped by sample location
java -jar /usr/local/bin/PGDSpider2-cli.jar -inputfile data/POPGEN/BMA_by_pop_genepop.gen -inputformat GENEPOP -outputfile data/POPGEN/BMA.genotypes_bayescan.txt -outputformat GESTE_BAYE_SCAN -spid scr/genepop2bayscan.spid
Get list of loci in data set to be able to identify outlier loci by name.
Prior odds for neutral model are determined using -pr_odds and indicate how much more likely the neutral model (does not include locus-specific effect alpha) is compared to the model of select (includes locus-specific effect alpha). Prior odds = 10 indicates that the neutral model is 10x more likely. The test of selection becomes more conservative/less false positive outlier are detected with increasing prior odds. For large amount of data (loci) including many populations and individuals per population, the prior odds should have little influence on on results but for realistic data sets (< 20 populations), the choice of prior odds can have strong effect and should be adjusted based on the number of loci included in the data set. For < 1,000 loci, prior odds = 10 is reasonable for larger data sets (1,000 = 10,000), prior odds = 100 is more appropriate, for millions of markers prior odds = 10,000 may be necessary.
Short, successive pilot runs (-nbp) are used to to adjust acceptance rates between 0.25 - 0.45 (-pilot determines iterations) and choose proposal distribution for reversible jump by estimating mean and variance for all alpha’s under the saturated model (contains all alpha parematers), which is close to full conditional distribution and generally creates conservative enough parameters to ensure convergence.
Sample size (-n) corresponds to number of iterations the program will write out/use for estimation of parameters after initial burn-in (-burn).
Thinning interval (-thin) is number of intervals between two samples; reduces autocorrelation from data generated using Markov chain
Total number of interations is (sample size x thinning interval) + burn-in.
# run for individuals grouped by sample location
BayeScan2.1_linux64bits data/POPGEN/BMA.genotypes_bayescan.txt -od results/ -o BMA_pop.bayescanPO10kburn100kn10kthin50 -all_trace -threads 45 -n 10000 -thin 50 -nbp 25 -pilot 5000 -burn 50000 -pr_odds 10000 -out_pilot -out_freq
Parameters used:
## [1] "There are 5554 loci."
## [2] ""
## [3] "There are 9 populations."
## [4] ""
## [5] "Burn in: 50000"
## [6] "Thining interval: 50"
## [7] "Sample size: 10000"
## [8] "Resulting total number of iterations: 550000"
## [9] "Nb of pilot runs: 25"
## [10] "Length of each pilot run: 5000"
## [11] ""
Plot posterior distributions:
Full output of MCMC algorithm is in *.sel file. Each line corresponds to iteration of MCMC algorithm, columns contain iteration index (start after pilot runs/burn-in), logarithm of likelihood, Fst coefficient for every population, alpha coefficients for every locus.
Counting the null values of alpha gives the posterior probability for the neutral model (only written out if -all_trace flag enabled).
Trace of likelihood and values of Fst over iterations.
Verify that sample size used to estimate posteriors is sufficiently large. Effective sample size to estimate parameters can be smaller than value used for BayeScan run (10,000). MCMC explores the parameter space by moving in small steps. Therefore, two consecutive values will be strongly correlated; used thinning interval of 50 to reduce autocorrelation.
Check correlation between sampled parameter values for thinned chains used to estimate posterior probability. Effective sample size will be smaller than sample size used as input (10,000) if there is some correlation.
## 'mcmc' num [1:9999, 1:5565] 50050 50100 50150 50200 50250 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:5565] "iteration" "logL" "fst1" "fst2" ...
## - attr(*, "mcpar")= num [1:3] 1 499901 50
Comparison of effective sample sizes for fst, alpha, and likelihood
Effective size of likelihood sample should be smaller than input value of 10,000, while Fst parameters less affected by correlation because correlation decreases more rapidly for Fst values than for likelihood.
Test for convergence
Test for non-convergence of chains using Geweke’s convergence diagnostic which compares the means of the first and last parts of the MC and reports the z-scores for each parameter.
For α = 0.05, critical values of z are – 1.96 and +1.96, i.e. if z values fall within those boundaries indicative of equality of means and therefore convergence of MCMC. On the otherhand z < -1.96 or z > 1.96 null hypothesis of equality of means should be rejected.
Comparison of z values for fst, alpha, and log likelihood.
Compare Fst, alpha, and q-values
File *_fst.txt contains one locus per row (first column). Columns 2 - 4 correspond to posterior probability for model including selection (prob), log10 of posterior odds for model including slection (log10PO), q-value for model including selection (qval); these are related to the test of local adaptation, i.e. model including locus-specific effect alpha. Fifth column is estimated locus-specific effect alpha (alpha) which indicates the strength and direction (positive values indicate diversifying selection). Final column is the locus-specific Fst coefficient averaged over populations (fst).
Use q-value to determine identify loci putatively under the influence of selection.
Distribution of q-values.
Number of loci with q-value < 0.05
| qval <= 0.05 | n |
|---|---|
| FALSE | 5539 |
| TRUE | 15 |
Number of loci with q-value < 0.01
| qval <= 0.01 | n |
|---|---|
| FALSE | 5541 |
| TRUE | 13 |
Number of loci with q-value < 0.001
| qval <= 0.001 | n |
|---|---|
| FALSE | 5542 |
| TRUE | 12 |
Distribution of estimated Fst and locus-specific alpha component
Distribution of estimated Fst and locus-specific alpha component.
Identify outlier loci
Identify loci with q-value < 0.05.
Relationship log10(qvalue) and Fst per locus
A total of 15 Fst-outlier loci were identified using bayescan (p < 0.05).
The FDIST method (Beaumont and Nichols 1996Beaumont, M. A., and R. A. Nichols. 1996. “Evaluating loci for use in the genetic analysis of population structure.” Proceedings of the Royal Society B: Biological Sciences 263 (1377): 1619–26. https://doi.org/10.1098/rspb.1996.0237.; Excoffier and Lischer 2010Excoffier, Laurent, and Heidi E. L. Lischer. 2010. “Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows.” Molecular Ecology Resources 10 (3): 564–67. https://doi.org/10.1111/j.1755-0998.2010.02847.x.) is implemented in Arlequin using a strict island model. Coalescent simulations basd on the distribution of heterozygosity in the empirical data set are used to create null distributions for Fst for a given number of demes.
Run FDIST method implmented in Arlequin
Use PGDSpider to convert genepop to arlequin (*.arp) files. Need to define Structure of demes in order for Arlequin to run. Assume all in one group. Use Arlequin to create settings files; make sure linux line endings and *.ars settings files are set as executable.
Run all models for 20,000 simulations. For island model simulate 3, 4, 9, 25, 50, and 100 demes. Execute bash script to run different combinations of input files and settings files and move/rename output files into results directory.
./scr/runFDIST_Arlequin.sh
Evaluate simulated and observed Fst-He distributions
Fst-heterozygosity distribution for simulated and observed data set using nine demes (number of putative populations assumed in data set). Color of individual points depict levels significance (darker colors indicate smaller values).
Identify outlier loci potentially under selection and compare patterns of variance
Identify loci with corrected p-value < 0.05.
A total of 48 Fst-outlier loci were identified using arlequin (p < 0.05).
Identify sets of loci shared between methods.
Comparison of set sizes per outlier method and intersecting sets of loci.
Redundancy analysis is an ordination method to determine how much variation of one set of variables (explanatory variables) explains the variation in another set of variables (response variables), i.e. it is the multivariate analog of a simple linear regression. It assumes that the relationship between the dependent (response) and independent (explanatory) variables is linear.
Here, it can be applied to understand which geographic and/or environmental factors explain the overall observed pattern of genetic variation on the landscape (Oksanen et al. 2010Oksanen, Jari, Roeland Kindt, Pierre Legendre, Bob O Hara, Gavin L Simpson, Petery Solymos, M Henry H Stevens, and Helene Wagner. 2010. “vegan: Community Ecology Package. R package version 1.17-3.” October 10 (01): 2008. http:/.; Forester et al. 2016Forester, Brenna R., Matthew R. Jones, Stéphane Joost, Erin L. Landguth, and Jesse R. Lasky. 2016. “Detecting spatial genetic signatures of local adaptation in heterogeneous landscapes.” Molecular Ecology 25 (1): 104–20. https://doi.org/10.1111/mec.13476., 2018Forester, Brenna R., Jesse R. Lasky, Helene H. Wagner, and Dean L. Urban. 2018. “Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations.” Molecular Ecology 27 (9): 2215–33. https://doi.org/10.1111/mec.14584.).
Response variables: Genotypes
Format response variables (genetic data) as allele counts per locus and individual, homozygous calls are coded as 0 or 2, heterozygotes as 1. RDA requires a complete data set, therefore missing values are replaced with mean allele frequency.
All alleles are coded as 0, 1, or 2 therefore, scaling/centering should not be necessary.
setPop(gen) <- ~POP
# replace missing data
nomissing <- missingno(gen, type = "mean")
# extract allele counts
allelecount <- as.data.frame(tab(nomissing))
The repsonse variables consist of 18606 loci genotyped for 367 individuals across 9 samples estuaries.
Explanatory variables: Moran’s Eigenvector Maps (MEMs)
Calculation of spatial eigenvector maps requires a distance matrix. Use great circle distance calculated using lat/long information for individual samples. Use this matrix to calculate db-MEMs. Keep default setting for truncation, i.e. the length of the longest edge of the minimum spanning tree will be used as the threshold. Only positive MEMs are retained.
MEMs are orthogonal vectors with a unit norm that maximizes spatial autocorrelation (Griffith 1996Griffith, Daniel A. 1996. “Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data.” Canadian Geographer 40 (4): 351–67. https://doi.org/10.1111/j.1541-0064.1996.tb00462.x.; Dray et al. 2012Dray, S., R. Pélissier, P. Couteron, M. J. Fortin, P. Legendre, P. R. Peres-Neto, E. Bellier, et al. 2012. “Community ecology in the age of multivariate multiscale spatial analysis.” John Wiley & Sons, Ltd. https://doi.org/10.1890/11-1183.1.).
This gives us 6 MEMs.
Plot them using adegraphics
Visualize the MEM values - 1st MEMs are large spatial scales, spatial scales increasingly smaller for smaller MEMs (this is independent of genetic patterns; purely spatial/sampling design).
Fig 2: dbMEM values per samples. Positive values are blue, negative values are yellow, the size of the circles are scaled by magnitude.
Model selection
Select best model for spatial variables using Forward model selection using permutation tests on rda object, a model is defined and a given scope of models considered. The function alternates with drop and add steps and stops when model is not changed during one step.
# run initial rda
rda.xy = rda(allelecount ~ ., data.frame(dbMEM), scale = FALSE)
stp.xy = ordiR2step(rda(allelecount ~ 1, data.frame(dbMEM)),
scope = formula(rda.xy),
R2scope = TRUE,
scale = FALSE,
Pin = 0.05,
Pout = 0.1,
direction="both",
R2permutations = 999,
parallel = 45)
## Step: R2.adj= 0
## Call: allelecount ~ 1
##
## R2.adjusted
## <All variables> 0.00794247207
## + MEM1 0.00536696607
## + MEM3 0.00108849267
## + MEM2 0.00094741207
## + MEM4 0.00036982618
## + MEM5 0.00013556168
## <none> 0.00000000000
## + MEM6 -0.00007458758
##
## Df AIC F Pr(>F)
## + MEM1 1 2956.5 2.9749 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.005366966
## Call: allelecount ~ MEM1
##
## R2.adjusted
## <All variables> 0.007942472
## + MEM3 0.006473194
## + MEM2 0.006331725
## + MEM4 0.005752553
## + MEM5 0.005517645
## <none> 0.005366966
## + MEM6 0.005306918
##
## Df AIC F Pr(>F)
## + MEM3 1 2957.1 1.4064 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.006473194
## Call: allelecount ~ MEM1 + MEM3
##
## R2.adjusted
## <All variables> 0.007942472
## + MEM2 0.007443658
## + MEM4 0.006862890
## + MEM5 0.006627335
## <none> 0.006473194
## + MEM6 0.006416027
##
## Df AIC F Pr(>F)
## + MEM2 1 2957.8 1.3559 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.007443658
## Call: allelecount ~ MEM1 + MEM3 + MEM2
##
## R2.adjusted
## <All variables> 0.007942472
## + MEM4 0.007837112
## + MEM5 0.007600906
## <none> 0.007443658
## + MEM6 0.007389015
##
## Df AIC F Pr(>F)
## + MEM4 1 2958.6 1.144 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.007837112
## Call: allelecount ~ MEM1 + MEM3 + MEM2 + MEM4
##
## R2.adjusted
## + MEM5 0.007995885
## <All variables> 0.007942472
## <none> 0.007837112
## + MEM6 0.007783407
Run RDA
The selected model consists of MEM1, MEM3, MEM2, and MEM4.
xy_selected <- dbMEM %>%
as.data.frame() %>%
select(one_of(attributes(stp.xy$terms)$term.labels))
rda.xy <- rda(allelecount ~ ., data = xy_selected, scale = FALSE)
Determine proportion of genetic variance explained by geographic data by running RDA using the selected spatial variables as the explanatory variables, i.e. fitting linear trend surface (accounting for linear geographic scale).
## Call: rda(formula = allelecount ~ MEM1 + MEM3 + MEM2 + MEM4, data =
## xy_selected, scale = FALSE)
##
## Inertia Proportion Rank
## Total 3152.38816 1.00000
## Constrained 58.88794 0.01868 4
## Unconstrained 3093.50022 0.98132 362
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4
## 28.865 12.301 9.112 8.610
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 33.03 26.05 13.72 13.66 13.19 13.16 13.07 13.04
## (Showing 8 of 362 unconstrained eigenvalues)
Test for significance using a permutation test to generate p-value indicating whether to reflect null hypothesis (geography does not explain genetic variation). Rows of unconstrained matrix (genetic data set) are repeatedly randomized for n permutations. Relationship is considered significant if observed relationship is stronger than the randomly permuted relationships.
anova.cca(rda.xy,
permutations = 1000,
parallel = 45)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 1000
##
## Model: rda(formula = allelecount ~ MEM1 + MEM3 + MEM2 + MEM4, data = xy_selected, scale = FALSE)
## Df Variance F Pr(>F)
## Model 4 58.89 1.7228 0.000999 ***
## Residual 362 3093.50
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Proportion of variance explained by the environmental predictors is Proportion column of Constrained row in summary table (equivalent to R2 of multiple regression).
R2 (0.0187) needs to be adjusted based on the number of predictors, as the number of explanatory variables inflates the apparent amount of explained variance due to random correlation. The adjusted R2 values is 0.0078
## Importance of components:
## RDA1 RDA2 RDA3 RDA4
## Eigenvalue 28.8651 12.3006 9.1121 8.6101
## Proportion Explained 0.4902 0.2089 0.1547 0.1462
## Cumulative Proportion 0.4902 0.6991 0.8538 1.0000
Fig 3: Eigenvalues for the constrained axes reflect the variance explained by each canonical axis
Compare constraints (explanatory variables).
Fig 4: Biplot of constraints.
Use Mahalanobis distance to identify alleles with strongest association to first three RDA axis.
Fig 5: Distribution of alleles; alleles from loci with at least one allele flagged as non-associated (grey) vs outlier (red) based on Mahalanobis distance threshold of 18.46 for four degrees of freedom (p < 0.001).
Table 1a: Number of loci significantly associated allele (for bivariate distribution critical value = 13.82 (p < 0.001))
| MAH_DIST >= 18.46 | n |
|---|---|
| FALSE | 5166 |
| TRUE | 388 |
Table 1b: Number of loci significantly associated allele (for bivariate distribution critical value = 25 (p < 0.001))
| MAH_DIST >= 25 | n |
|---|---|
| FALSE | 5396 |
| TRUE | 158 |
Compare clustering of individual based on weighted average individual scores, i.e. weighted averages of allele scores that are as similar to linear combination scores as possible. Weights are individual totals of alleles. To determine how well explanatory variables separate groups of individuals or if explanatory variables can be used to discriminate between groups of individuals, use wa-scores.
Fig 6: RDA biplot indicating spatial variables (arrows) and clustering of individuals on RDA1-3.
Response variables: Genotypes
Format response variables (genetic data) as allele counts per locus and individual, homozygous calls are coded as 0 or 2, heterozygotes as 1. RDA requires a complete data set, therefore missing values are replaced with mean allele frequency.
All alleles are coded as 0, 1, or 2 therefore, scaling/centering should not be necessary.
setPop(gen) <- ~POP
# replace missing data
nomissing <- missingno(gen, type = "mean")
# extract allele counts
allelecount <- as.data.frame(tab(nomissing))
The repsonse variables consist of 18606 loci genotyped for 367 individuals across 9 samples estuaries.
Explanatory variables: Bioclim variables
Download & Format environmental parameters
# access marine data sets
marine <- list_datasets(terrestrial = FALSE, marine = TRUE)
# identify available layers & break into sets
layers <- list_layers(marine) %>%
separate(layer_code, into = c("dataset", "parameter", "stat"), extra = "merge", remove = FALSE) %>%
filter(!dataset == "BO2") %>%
mutate(SET = case_when(dataset == "MS" ~ "MARSPEC",
dataset == "BO" ~ "BIO_CLIM",
grepl("phytoplankton", description, ignore.case = TRUE) ~ "carb_phytopl",
grepl("Chlorophyll", description, ignore.case = TRUE) ~ "chlorophyll",
grepl("velocity", description, ignore.case = TRUE) ~ "velocity",
grepl("oxygen", description, ignore.case = TRUE) ~ "DO",
grepl("Iron", description, ignore.case = TRUE) ~ "Iron",
grepl("Light", description, ignore.case = TRUE) ~ "Light",
grepl("Nitrate", description, ignore.case = TRUE) ~ "Nitrate",
grepl("Phosphate", description, ignore.case = TRUE) ~ "Phosphate",
grepl("Primary", description, ignore.case = TRUE) ~ "PrimaryProd",
grepl("ice", description, ignore.case = TRUE) ~ "SeaIce",
grepl("surface", description, ignore.case = TRUE) ~ "SeaSurface",
grepl("water", description, ignore.case = TRUE) ~ "SeaWater",
grepl("Silicate", description, ignore.case = TRUE) ~ "Silicate"))
Loop through layers to download and extract values for coordinates.
# vector with sets
set <- unique(layers$SET)
set <- set[1:2]
for(s in set){
cat(s,"start download ...")
# data frame for results
env <- strata %>%
dplyr::select(LIB_ID, POP, LAT, LONG)
# subset layers
layer_subset <- layers %>%
filter(SET == s)
# filename
path <- glue("data/POPGEN/{s}_env.param")
# download all the layers in the set
for(l in layer_subset$layer_code){
# download layer
env_layer <- load_layers(l)
# coordinates
xy <- xy
# extract values for xy coordinates
param <- raster::extract(x = env_layer,
y = xy,
method = "bilinear") %>%
as.data.frame() %>%
magrittr::set_names(l)
# combine results
env <- env %>%
bind_cols(param)
# write to file
write_delim(env, path, delim = "\t")
}
}
Load variables for all sets of variables.
Model selection
We are going to use only the original BIO_CLIM and MARSPEC data sets.
Table 2: MARSPEC and BIO-CLIM variables used for model selection.
| layer_code | description |
|---|---|
| BO_calcite | Calcite concentration indicates the mean concentration of calcite (CaCO3) in oceans. |
| BO_chlomax | Chlorophyll A concentration indicates the concentration of photosynthetic pigment chlorophyll A (the most common green chlorophyll) in oceans. Please note that in shallow water these values may reflect any kind of autotrophic biomass. |
| BO_chlomean | Chlorophyll A concentration indicates the concentration of photosynthetic pigment chlorophyll A (the most common green chlorophyll) in oceans. Please note that in shallow water these values may reflect any kind of autotrophic biomass. |
| BO_chlomin | Chlorophyll A concentration indicates the concentration of photosynthetic pigment chlorophyll A (the most common green chlorophyll) in oceans. Please note that in shallow water these values may reflect any kind of autotrophic biomass. |
| BO_chlorange | Chlorophyll A concentration indicates the concentration of photosynthetic pigment chlorophyll A (the most common green chlorophyll) in oceans. Please note that in shallow water these values may reflect any kind of autotrophic biomass. |
| BO_cloudmean | Cloud fraction indicates how much of the earth is covered by clouds. |
| BO_cloudmin | Cloud fraction indicates how much of the earth is covered by clouds. |
| BO_damax | The diffuse attenuation coefficient is an indicator of water clarity. It expresses how deeply visible light in the blue to the green region of the spectrum penetrates in to the water column. |
| BO_damean | The diffuse attenuation coefficient is an indicator of water clarity. It expresses how deeply visible light in the blue to the green region of the spectrum penetrates in to the water column. |
| BO_damin | The diffuse attenuation coefficient is an indicator of water clarity. It expresses how deeply visible light in the blue to the green region of the spectrum penetrates in to the water column. |
| BO_dissox | Dissolved oxygen concentration [02] |
| BO_nitrate | This layer contains both [NO3] and [NO3+NO2] data. By this we mean chemically reactive dissolved inorganic nitrate and nitrate or nitrite. (It is important to note that data reported as [NO3] in the WOD09 should be used with caution because it is difficult to verify that the [NO3] (nitrate) data are [NO3+NO2] or [NO3]. (Boyer et al. 2009)) |
| BO_parmax | Photosynthetically Available Radiation (PAR) indicates the quantum energy flux from the Sun (in the spectral range 400-700 nm) reaching the ocean surface. |
| BO_parmean | Photosynthetically Available Radiation (PAR) indicates the quantum energy flux from the Sun (in the spectral range 400-700 nm) reaching the ocean surface. |
| BO_ph | Measure of acidity in the ocean. |
| BO_phosphate | Reactive ortho-phosphate concentration [HPO4^-2] in the ocean. |
| BO_salinity | Salinity indicates the dissolved salt content in the ocean. |
| BO_silicate | This variable indicates the concentration of silicate or ortho-silicic acid [Si(OH)4] in the ocean |
| BO_sstmax | Sea surface temperature is the temperature of the water at the ocean surface. This parameter indicates the temperature of the topmost meter of the ocean water column. |
| BO_sstmean | Sea surface temperature is the temperature of the water at the ocean surface. This parameter indicates the temperature of the topmost meter of the ocean water column. |
| BO_sstmin | Sea surface temperature is the temperature of the water at the ocean surface. This parameter indicates the temperature of the topmost meter of the ocean water column. |
| BO_sstrange | Sea surface temperature is the temperature of the water at the ocean surface. This parameter indicates the temperature of the topmost meter of the ocean water column. |
| BO_bathymin | Minimum depth of the seafloor |
| BO_bathymax | Maximum depth of the seafloor |
| BO_bathymean | Average depth of the seafloor |
| MS_bathy_5m | Depth of the seafloor |
| MS_biogeo01_aspect_EW_5m | East/West Aspect (sin(aspect in radians)) |
| MS_biogeo02_aspect_NS_5m | North/South Aspect (cos(aspect in radians)) |
| MS_biogeo03_plan_curvature_5m | Plan curvature is the curvature in the direction perpendicular to the maximum slope and indicates whether flow across a surface would diverge (positive values) or converge (negative values). |
| MS_biogeo04_profile_curvature_5m | Profile curvature is the curvature in the direction parallel to the maximum slope and indicates whether flow across a surface would accelerate (positive values) or decelerate (negative values). |
| MS_biogeo05_dist_shore_5m | ’’ |
| MS_biogeo06_bathy_slope_5m | Bathymetric slope was measured in degrees ranging from 0??? (flat surface) to 90??? (vertical slope). |
| MS_biogeo07_concavity_5m | Concavity is the second derivative of the bathymetry layer (or the slope of the slope) and represents whether a raster cell is on a hill (negative values) or in a valley (positive values). |
| MS_biogeo08_sss_mean_5m | Measurements of sea surface salinity (SSS) were obtained from in situ oceanographic observations compiled by NOAA?s World Ocean Atlas 2009 (WOA09, Antonov et al. 2010). |
| MS_biogeo09_sss_min_5m | Measurements of sea surface salinity (SSS) were obtained from in situ oceanographic observations compiled by NOAA?s World Ocean Atlas 2009 (WOA09, Antonov et al. 2010). |
| MS_biogeo10_sss_max_5m | Measurements of sea surface salinity (SSS) were obtained from in situ oceanographic observations compiled by NOAA?s World Ocean Atlas 2009 (WOA09, Antonov et al. 2010). |
| MS_biogeo11_sss_range_5m | Measurements of sea surface salinity (SSS) were obtained from in situ oceanographic observations compiled by NOAA?s World Ocean Atlas 2009 (WOA09, Antonov et al. 2010). |
| MS_biogeo12_sss_variance_5m | Measurements of sea surface salinity (SSS) were obtained from in situ oceanographic observations compiled by NOAA?s World Ocean Atlas 2009 (WOA09, Antonov et al. 2010). |
| MS_biogeo13_sst_mean_5m | Satellite measures of sea surface temperature (SST) were obtained at a 2.5 arc-minute resolution (approximately 4 km_) from Aqua-MODIS 4-micron nighttime SST Level 3 standard mapped image products, downloaded from NASA’s Ocean Color website (http://oceancolor.gsfc.nasa.gov/). |
| MS_biogeo14_sst_min_5m | Satellite measures of sea surface temperature (SST) were obtained at a 2.5 arc-minute resolution (approximately 4 km_) from Aqua-MODIS 4-micron nighttime SST Level 3 standard mapped image products, downloaded from NASA’s Ocean Color website (http://oceancolor.gsfc.nasa.gov/). |
| MS_biogeo15_sst_max_5m | Satellite measures of sea surface temperature (SST) were obtained at a 2.5 arc-minute resolution (approximately 4 km_) from Aqua-MODIS 4-micron nighttime SST Level 3 standard mapped image products, downloaded from NASA’s Ocean Color website (http://oceancolor.gsfc.nasa.gov/). |
| MS_biogeo16_sst_range_5m | Satellite measures of sea surface temperature (SST) were obtained at a 2.5 arc-minute resolution (approximately 4 km_) from Aqua-MODIS 4-micron nighttime SST Level 3 standard mapped image products, downloaded from NASA’s Ocean Color website (http://oceancolor.gsfc.nasa.gov/). |
| MS_biogeo17_sst_variance_5m | Satellite measures of sea surface temperature (SST) were obtained at a 2.5 arc-minute resolution (approximately 4 km_) from Aqua-MODIS 4-micron nighttime SST Level 3 standard mapped image products, downloaded from NASA’s Ocean Color website (http://oceancolor.gsfc.nasa.gov/). |
| MS_sss01_5m | Average sea surface salinity for the month january. |
| MS_sss02_5m | Average sea surface salinity for the month february. |
| MS_sss03_5m | Average sea surface salinity for the month march |
| MS_sss04_5m | Average sea surface salinity for the month april. |
| MS_sss05_5m | Average sea surface salinity for the month may. |
| MS_sss06_5m | Average sea surface salinity for the month june. |
| MS_sss07_5m | Average sea surface salinity for the month july. |
| MS_sss08_5m | Average sea surface salinity for the month august. |
| MS_sss09_5m | Average sea surface salinity for the month september. |
| MS_sss10_5m | Average sea surface salinity for the month october. |
| MS_sss11_5m | Average sea surface salinity for the month november. |
| MS_sss12_5m | Average sea surface salinity for the month december. |
| MS_sst01_5m | Average sea surface temperature for the month january. |
| MS_sst02_5m | Average sea surface temperature for the month february. |
| MS_sst03_5m | Average sea surface temperature for the month march |
| MS_sst04_5m | Average sea surface temperature for the month april. |
| MS_sst05_5m | Average sea surface temperature for the month may. |
| MS_sst06_5m | Average sea surface temperature for the month june. |
| MS_sst07_5m | Average sea surface temperature for the month july. |
| MS_sst08_5m | Average sea surface temperature for the month august. |
| MS_sst09_5m | Average sea surface temperature for the month september. |
| MS_sst10_5m | Average sea surface temperature for the month october. |
| MS_sst11_5m | Average sea surface temperature for the month november. |
| MS_sst12_5m | Average sea surface temperature for the month december. |
Run model selection using stepwise process.
# format data for rda
env_param <- env_param_tidy %>%
filter(PARAMETER %in% select$layer_code) %>%
group_by(PARAMETER) %>%
mutate(VALUES = replace(VALUES, is.na(VALUES), mean(VALUES, na.rm=TRUE))) %>%
pivot_wider(names_from = "PARAMETER", values_from = "VALUES") %>%
column_to_rownames("LIB_ID")
write_delim(env_param, "data/POPGEN/bioclim.param", delim = "\t")
# run initial rda with all parameters
rda <- rda(allelecount ~ ., data = env_param, scale = TRUE)
# stepwise selection
stp.env = ordiR2step(rda(allelecount ~ 1, env_param),
scope = formula(rda),
R2scope = TRUE,
scale = FALSE,
Pin = 0.05,
Pout = 0.1,
direction="both",
R2permutations = 999,
parallel = 35)
Select best model for spatial variables using step-wise model selection based on permutation tests of rda object, a model is defined and a given scope of models considered. The function alternates with drop and add steps and stops when model is not changed during one step.
The best model consists of MS_sst06_5m, MS_bathy_5m, BO_phosphate, BO_dissox, and BO_parmean.
Distribution of values per sample location for each parameter selected for the environmental RDA model.
Table 3: Variables selected using stepwise model selection based on permutation tests of rda object for Pin = 0.05 and Pout = 0.1.
| layer_code | description |
|---|---|
| BO_dissox | Dissolved oxygen concentration [02] |
| BO_parmean | Photosynthetically Available Radiation (PAR) indicates the quantum energy flux from the Sun (in the spectral range 400-700 nm) reaching the ocean surface. |
| BO_phosphate | Reactive ortho-phosphate concentration [HPO4^-2] in the ocean. |
| MS_bathy_5m | Depth of the seafloor |
| MS_sst06_5m | Average sea surface temperature for the month june. |
Run RDA
Run RDA using selected principle components and test for significance.
env_model <- env_param_tidy %>%
filter(PARAMETER %in% selected) %>%
group_by(PARAMETER) %>%
mutate(VALUES = replace(VALUES, is.na(VALUES), mean(VALUES, na.rm=TRUE))) %>%
pivot_wider(names_from = "PARAMETER", values_from = "VALUES") %>%
column_to_rownames("LIB_ID")
rda.env <- rda(allelecount ~ ., data = env_model, scale = TRUE)
Determine the proportion of genetic variance explained by geographic data by running RDA using the selected environemtal variables as the explanatory variables.
## Call: rda(formula = allelecount ~ BO_dissox + BO_parmean + BO_phosphate
## + MS_bathy_5m + MS_sst06_5m, data = env_model, scale = TRUE)
##
## Inertia Proportion Rank
## Total 18606.00000 1.00000
## Constrained 540.27325 0.02904 5
## Unconstrained 18065.72675 0.97096 361
## Inertia is correlations
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5
## 188.06 157.52 87.76 57.92 49.02
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 93.17 87.13 83.63 82.72 81.91 80.67 80.36 80.11
## (Showing 8 of 361 unconstrained eigenvalues)
Test for significance using a permutation test to generate p-value indicating whether to reflect the null hypothesis (environmental data does not explain genetic variation). Rows of unconstrained matrix (genetic data set) are repeatedly randomized for n permutations. Relationship is considered significant if observed relationship is stronger than the randomly permuted relationships.
anova.cca(rda.env,
permutations = 1000,
parallel = 45)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 1000
##
## Model: rda(formula = allelecount ~ BO_dissox + BO_parmean + BO_phosphate + MS_bathy_5m + MS_sst06_5m, data = env_model, scale = TRUE)
## Df Variance F Pr(>F)
## Model 5 540.3 2.1592 0.000999 ***
## Residual 361 18065.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The adjusted R2 values is 0.0156
Compare results
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5
## Eigenvalue 188.0583 157.5159 87.7640 57.9200 49.01506
## Proportion Explained 0.3481 0.2915 0.1624 0.1072 0.09072
## Cumulative Proportion 0.3481 0.6396 0.8021 0.9093 1.00000
Fig 7: Eigenvalues for the constrained axes reflect the variance explained by each canonical axis.
Compare constraints (explanatory variables).
Fig 8: Biplot of constraints.
Use Mahalanobis distance to identify alleles with strongest association to first two RDA axis.
Fig 9: Distribution of alleles; alleles from loci with at least one allele flagged as non-associated (grey) vs outlier (red) based on Mahalanobis distance threshold of 20.51 for p < .001, (5 degrees of freedom)
Table 4a: Number of loci significantly associated allele (for bivariate distribution critical value = 13.82 (p < 0.001))
| MAH_DIST >= 20.51 | n |
|---|---|
| FALSE | 5436 |
| TRUE | 118 |
Table 4b: Number of loci significantly associated allele.
| MAH_DIST >= 25 | n |
|---|---|
| FALSE | 5503 |
| TRUE | 51 |
Compare clustering of individual based on weighted average individual scores, i.e. weighted averages of allele scores that are as similar to linear combination scores as possible. Weights are individual totals of alleles. To determine how well explanatory variables separate groups of individuals or if explanatory variables can be used to discriminate between groups of individuals, use wa-scores.
Fig 10: RDA biplot indicating environmental variables (arrows) and clustering of individuals.
Perform RDA for all variables (spatial and environmental; full model).
## Call: rda(formula = allelecount ~ xy.mat + env.mat, scale = FALSE)
##
## Inertia Proportion Rank
## Total 3152.38816 1.00000
## Constrained 138.87963 0.04406 9
## Unconstrained 3013.50853 0.95594 357
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9
## 40.77 27.70 18.40 9.15 8.88 8.62 8.59 8.46 8.31
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 13.812 13.580 13.421 13.158 13.045 13.017 12.984 12.895
## (Showing 8 of 357 unconstrained eigenvalues)
Test for significance of overall model.
The overall model is significant (P = 0.001).
Partition the variation in genetic data into components accounted for by environmental and spatial variables and their combined effect using adjusted R squared to assess partitions explained by explanatory tables and their combinations.
##
## Partition of variance in RDA
##
## Call: varpart(Y = allelecount, X = ~xy.mat, ~env.mat)
##
## Explanatory tables:
## X1: ~xy.mat
## X2: ~env.mat
##
## No. of explanatory tables: 2
## Total variation (SS): 1153774
## Variance: 3152.4
## No. of observations: 367
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+b] = X1 4 0.01868 0.00784 TRUE
## [b+c] = X2 5 0.03231 0.01890 TRUE
## [a+b+c] = X1+X2 9 0.04406 0.01996 TRUE
## Individual fractions
## [a] = X1|X2 4 0.00105 TRUE
## [b] 0 0.00678 FALSE
## [c] = X2|X1 5 0.01212 TRUE
## [d] = Residuals 0.98004 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
Extract fraction explained by each component:
Test significance of all components using permuted p-values.
Variance partitioning of variance explained by db_MEMs (xy), environmental variables (env), and shared effects due to correlation of xy and env.
| PARTITION | VARIANCE | PVAL |
|---|---|---|
| residuals | 0.9800441 | NA |
| xy+env+shared | 0.0199559 | 0.001 |
| env+shared | 0.0189025 | 0.001 |
| env | 0.0121188 | 0.001 |
| xy+shared | 0.0078371 | 0.001 |
| shared | 0.0067837 | NA |
| xy | 0.0010534 | 0.001 |
Tidy data set containing loci (contigs) flagged as Fst-outlier for each method, and loci flagged as strongly associated with first three RDA axis for spatial and environmental RDAs and identify sets of loci shared between methods.
Fig. 14: Comparison of set sizes per outlier method and intersecting sets of loci.
Fig. 14: Comparison of set sizes per outlier method and intersecting sets of loci.
Outlier loci were defined as loci identified by at least one of the Fst-outlier methods or the RDA.
The outlier data set consists of 133loci.
Create neutral data set as loci remaining after all outlier loci are removed from data set.
The neutral data set consists of 5421 loci.
K-means clustering can be used for model-free clustering based on similarity to identify clusters of individuals. Discriminant analysis of principle components (DAPC) implemented in adegenet (Jombart, Devillard, and Balloux 2010Jombart, Thibaut, Sébastien Devillard, and François Balloux. 2010. “Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.” BMC Genetics 11 (1): 94. https://doi.org/10.1186/1471-2156-11-94.) can then used to determine membership probabilities of each sample to each inferred cluster. During DAPC, the genotype matrix is first PCA-transformed, samples are paritioned into a within- and between-group component to maximize discrimination between groups using linear discriminant analysis; the discriminant functions can be expressed as linear combinations of alleles and therefore, allele contributions to the identified patterns can be computed.
Use k-means clustering to identify clusters based on genetic similarity (data transformed using PCA). For clustering it is appropriate to retain all PCs (all variance in the data set), as overfitting is not an issue.
Fig 1: AIC for data clustered into K = 1 - 40 groups using k-means clustering.
Minimum value for AIC is obtained for K=10.
Perform stratified cross-validation of DAPC using a range of retained PCs, while keeping the number of discriminant functions fixed for K = 2 - 5 to determine the most appropriate number of principle components to retain sufficient variance to discriminate but not overfit the data.
Retaining too few PCs may result in important variance not being retained and therefore not informing the clustering analysis, while retaining too many PCs results in overfitting, i.e. assignment success decreases.
Fig 2: Proportion of test individuals correctly assigned to their cluster of origin based on number of retained PCs for K = 2 - 9.
The optimum number of principle components, %-variance retained and the mean optimum assignment succes for K = 2 - 10 clusters.
| k_clust | optPC | variance | mean_opt_success |
|---|---|---|---|
| 2 | 50 | 70.17 | 1.00 |
| 3 | 50 | 47.00 | 0.98 |
| 4 | 50 | 47.00 | 0.98 |
| 5 | 100 | 70.17 | 0.94 |
Genetic data is scaled and centered, then transformed using a PCA. Retained PCs are then transmitted to a linear discriminant analysis.
Calculate membership probability of each individual to per cluster.
Fig 3: Membership probability of each individual to clusters identified using k-means hierarchical clustering.
Individuals are grouped within sample locations in individuals panels as FLA, FLGS, FLGN, MB, MISS, CS, LA, CC, and CAMP.
Use k-means clustering to identify clusters based on genetic similarity (data transformed using PCA). For clustering it is appropriate to retain all PCs (all variance in the data set), as overfitting is not an issue.
Fig 1: AIC for data clustered into K = 1 - 40 groups using k-means clustering.
Minimum value for AIC is obtained for K=4.
Perform stratified cross-validation of DAPC using a range of retained PCs, while keeping the number of discriminant functions fixed for K = 2 - 5 to determine the most appropriate number of principle components to retain sufficient variance to discriminate but not overfit the data.
Retaining too few PCs may result in important variance not being retained and therefore not informing the clustering analysis, while retaining too many PCs results in overfitting, i.e. assignment success decreases.
Fig 2: Proportion of test individuals correctly assigned to their cluster of origin based on number of retained PCs for K = 2 - 9.
The optimum number of principle components, %-variance retained and the mean optimum assignment succes for K = 2 - 10 clusters.
| k_clust | optPC | variance | mean_opt_success |
|---|---|---|---|
| 2 | 50 | 64.01 | 1.00 |
| 3 | 50 | 36.20 | 1.00 |
| 4 | 50 | 20.25 | 1.00 |
| 5 | 150 | 50.72 | 0.94 |
Genetic data is scaled and centered, then transformed using a PCA. Retained PCs are then transmitted to a linear discriminant analysis.
Calculate membership probability of each individual to per cluster.
Fig 3: Membership probability of each individual to clusters identified using k-means hierarchical clustering.
Individuals are grouped within sample locations in individuals panels as FLA, FLGS, FLGN, MB, MISS, CS, LA, CC, and CAMP.
Implemented in Arlequin
Pairwise FST (Weir and Cockerham 1984Weir, B. S., and C. Clark Cockerham. 1984. “Estimating F-Statistics for the Analysis of Population Structure.” Evolution 38 (6): 1358. https://doi.org/10.2307/2408641.) was calculated as a post hoc test for significant differences between oceans, among geographic regions, and estuaries. Significance determined using permuted p-values (1,000 iterations) and 95% confidence intervals around each estimate generated by resampling loci 10,000 times with replacement using functions implemented in hierfstat and assigner.
PAIRWISE COMPARISON BETWEEN OCEAN BASINS
Pairwise Fst among ocean basins.
| POP1 | POP2 | N_MARKERS | FST | CI_LOW | CI_HIGH | PVAL |
|---|---|---|---|---|---|---|
| GULF | ATL | 5554 | 0.027023 | 0.0257106 | 0.0283911 | 0.000999 |
Atlantic and Gulf samples are significantly different from each other.
PAIRWISE COMPARISONS AMONG GEOGRAPHIC REGIONS
Pairwise Fst among geographic regions. P-values and 95%-CI recored in table.
| POP1 | POP2 | N_MARKERS | FST | CI_LOW | CI_HIGH | PVAL |
|---|---|---|---|---|---|---|
| SGULF | SWATL | 5360 | 0.0512467 | 0.0489500 | 0.0536600 | 0.0009990 |
| SWATL | WGULF | 5408 | 0.0298616 | 0.0282982 | 0.0314237 | 0.0009990 |
| SWATL | CGULF | 5499 | 0.0288954 | 0.0274644 | 0.0303701 | 0.0009990 |
| EGULF | SWATL | 5437 | 0.0267737 | 0.0254036 | 0.0281738 | 0.0009990 |
| SGULF | EGULF | 5494 | 0.0258652 | 0.0244581 | 0.0272891 | 0.0009990 |
| SGULF | CGULF | 5520 | 0.0248761 | 0.0233697 | 0.0264718 | 0.0009990 |
| SGULF | WGULF | 5465 | 0.0227486 | 0.0212929 | 0.0242021 | 0.0009990 |
| EGULF | CGULF | 5523 | 0.0051954 | 0.0047239 | 0.0056983 | 0.0009990 |
| EGULF | WGULF | 5486 | 0.0051723 | 0.0046221 | 0.0057346 | 0.0009990 |
| CGULF | WGULF | 5517 | 0.0001506 | 0.0000000 | 0.0004078 | 0.2107892 |
With the exception of the central and western Gulf, all geographic regions are significantly different from each other.
PAIRWISE COMPARISONS AMONG SAMPLE LOCATIONS
Pairwise Fst among sample locations. P-values and 95%-CI recored in table.
| POP1 | POP2 | N_MARKERS | FST | CI_LOW | CI_HIGH | PVAL |
|---|---|---|---|---|---|---|
| CAMP | FLA | 5360 | 0.0512467 | 0.0488922 | 0.0536464 | 0.0009990 |
| FLA | CS | 5184 | 0.0306518 | 0.0288356 | 0.0324278 | 0.0009990 |
| FLA | LA | 5368 | 0.0303556 | 0.0287824 | 0.0319644 | 0.0009990 |
| FLA | CC | 5408 | 0.0298616 | 0.0283038 | 0.0314266 | 0.0009990 |
| FLA | MISS | 5335 | 0.0297276 | 0.0281899 | 0.0313440 | 0.0009990 |
| FLA | MB | 5369 | 0.0285740 | 0.0270575 | 0.0301262 | 0.0009990 |
| FLGS | FLA | 5352 | 0.0273936 | 0.0259640 | 0.0288965 | 0.0009990 |
| FLGN | FLA | 5324 | 0.0269153 | 0.0254449 | 0.0284279 | 0.0009990 |
| CAMP | FLGS | 5456 | 0.0264596 | 0.0249415 | 0.0280057 | 0.0009990 |
| CAMP | FLGN | 5434 | 0.0251159 | 0.0236080 | 0.0265965 | 0.0009990 |
| CAMP | LA | 5451 | 0.0250990 | 0.0234492 | 0.0268720 | 0.0009990 |
| CAMP | MB | 5455 | 0.0249516 | 0.0233432 | 0.0265992 | 0.0009990 |
| CAMP | MISS | 5426 | 0.0248768 | 0.0231738 | 0.0266535 | 0.0009990 |
| CAMP | CS | 5361 | 0.0242077 | 0.0223701 | 0.0260826 | 0.0009990 |
| CAMP | CC | 5465 | 0.0227486 | 0.0212690 | 0.0242935 | 0.0009990 |
| FLGS | LA | 5444 | 0.0064339 | 0.0057135 | 0.0071633 | 0.0009990 |
| FLGN | LA | 5426 | 0.0058392 | 0.0051129 | 0.0066202 | 0.0009990 |
| FLGS | CC | 5454 | 0.0056693 | 0.0050236 | 0.0063266 | 0.0009990 |
| FLGS | MISS | 5410 | 0.0055834 | 0.0048907 | 0.0063107 | 0.0009990 |
| FLGS | MB | 5427 | 0.0053704 | 0.0047443 | 0.0060005 | 0.0009990 |
| FLGS | CS | 5349 | 0.0054568 | 0.0044884 | 0.0064636 | 0.0009990 |
| FLGN | CC | 5434 | 0.0047991 | 0.0041527 | 0.0054663 | 0.0009990 |
| FLGN | MISS | 5394 | 0.0048103 | 0.0040661 | 0.0055689 | 0.0009990 |
| FLGN | CS | 5330 | 0.0049619 | 0.0040366 | 0.0059292 | 0.0009990 |
| FLGN | MB | 5415 | 0.0044797 | 0.0038322 | 0.0051598 | 0.0009990 |
| LA | MB | 5424 | 0.0006684 | 0.0002496 | 0.0010927 | 0.0219780 |
| CS | LA | 5353 | 0.0008099 | 0.0000898 | 0.0015353 | 0.0579421 |
| CS | MISS | 5327 | 0.0007802 | 0.0000203 | 0.0015433 | 0.0669331 |
| FLGS | FLGN | 5388 | 0.0002916 | 0.0000000 | 0.0007169 | 0.1668332 |
| CS | CC | 5390 | 0.0005656 | 0.0000000 | 0.0012437 | 0.1098901 |
| CS | MB | 5342 | 0.0005362 | 0.0000000 | 0.0012269 | 0.1228771 |
| CC | LA | 5443 | 0.0003122 | 0.0000000 | 0.0007022 | 0.1228771 |
| CC | MISS | 5443 | 0.0002717 | 0.0000000 | 0.0007181 | 0.1718282 |
| CC | MB | 5458 | 0.0003393 | 0.0000000 | 0.0007430 | 0.0999001 |
| LA | MISS | 5405 | 0.0004550 | 0.0000000 | 0.0009269 | 0.0929071 |
| MISS | MB | 5420 | 0.0000070 | 0.0000000 | 0.0004469 | 0.4725275 |
The Campeche and Atlantic sample location exhibit the strongest differentiation, followed by all other Atlantic-Gulf comparisons. All pairwise comparisons of locations in the northern Gulf are significantly different from the southern Gulf sample location. Within the northern Gulf, the northern and southern Florida samples are significantly different from sample locations in the central and western sample location but are not different from each other (within eastern Gulf region), similarly most pairwise comparisons among sample locations within the central and western Gulf regions are not significantly different from each other.
Calculate measures of genetic diversity based on allele frequencies.
# define groups to analyze ====
setPop(gen) <- ~OCEAN
gen_oce <- seppop(gen)
setPop(gen) <- ~REGION
gen_reg <- seppop(gen)
setPop(gen) <- ~POP
gen_pop <- seppop(gen)
gen_grp <- c(gen_oce, gen_reg, gen_pop)
gen_grp[["ALL"]] <- gen
# calculate diversity stats ====
loc_stats <- list()
for (p in names(gen_grp)) {
locA <- locus_table(gen_grp[[p]], index = "shannon") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
locB <- locus_table(gen_grp[[p]], index = "simpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
temp <- left_join(locA, locB)
locC <- locus_table(gen_grp[[p]], index = "invsimpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
loc_stats[[p]] <- left_join(temp, locC)
}
loc_stats <- ldply(loc_stats, data.frame) %>%
select(-Hexp) %>%
rename(GRP = `.id`,
SIMPSON_IDX = `X1.D`,
N_ALLELES = allele,
SHANNON_IDX = H,
STODD_TAYLOR_IDX = G,
EVENNESS = Evenness)
# calculate genetic diversity stats (heterozygosity-based) ====
loc_stats_2 <- list()
for (p in names(gen_grp)) {
dat <- genind2hierfstat(gen_grp[[p]])
stats <- basic.stats(dat)
loc_stats_2[[p]] <- stats$perloc %>%
rownames_to_column("LOCUS")
}
# combine into single data frame ====
loc_stats_2 <- ldply(loc_stats_2, data.frame) %>%
rename(GRP = `.id`)
loc_stats <- left_join(loc_stats, loc_stats_2) %>%
select(GRP, LOCUS, N_ALLELES, EVENNESS, Ho, Hs, Ht, Fis, SHANNON_IDX, SIMPSON_IDX, STODD_TAYLOR_IDX)
loc_stats[is.na(loc_stats)] <- NA
write_delim(loc_stats, "results/gendiv.locstats", delim = "\t")
Hs calculated as within population gene diversity (expected heterozygosity) according to (Nei 1987Nei, Masatoshi. 1987. Molecular Evolutionary Genetics. https://doi.org/10.7312/nei-92038.).
Distribution of expected heterozygosity per estuary
Median, mean +/- std gene diversity by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.2446 | 0.2714 | 0.2185 |
| FLGS | 0.2529 | 0.2822 | 0.2109 |
| FLGN | 0.2530 | 0.2834 | 0.2111 |
| MB | 0.2558 | 0.2840 | 0.2080 |
| MISS | 0.2508 | 0.2825 | 0.2104 |
| CS | 0.2605 | 0.2833 | 0.2170 |
| LA | 0.2522 | 0.2828 | 0.2095 |
| CC | 0.2526 | 0.2828 | 0.2077 |
| CAMP | 0.2559 | 0.2856 | 0.2099 |
Median, mean +/- std gene diversity by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.2446 | 0.2714 | 0.2185 |
| EGULF | 0.2530 | 0.2828 | 0.2110 |
| CGULF | 0.2554 | 0.2831 | 0.2112 |
| WGULF | 0.2526 | 0.2828 | 0.2077 |
| SGULF | 0.2559 | 0.2856 | 0.2099 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: Hs and GRP and LOCUS
## Friedman chi-squared = 294.88, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results pairwise comparisons using Wilcoxon signed rank test (p-value printed).
Calculate allelic richness corrected for sample size using rarefaction.
Compare levels rarefied allele counts among estuaries.
Distribution of allelic richness measured as rarefied allele counts per estuary
Median, mean +/- std rarefied allelic richness by sample location.
| pop | median | mean | std |
|---|---|---|---|
| FLA | 2.0000 | 2.2822 | 0.9950 |
| FLGS | 2.0000 | 2.4670 | 1.0675 |
| FLGN | 2.0000 | 2.4749 | 1.0711 |
| MB | 2.0038 | 2.4961 | 1.0724 |
| MISS | 2.0000 | 2.4921 | 1.0852 |
| CS | 2.0000 | 2.4712 | 1.1223 |
| LA | 2.0000 | 2.4905 | 1.0755 |
| CC | 2.0242 | 2.5004 | 1.0680 |
| CAMP | 2.0000 | 2.5967 | 1.1897 |
Median, mean +/- std rarefied alllelic richness by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 2.0000 | 2.2822 | 0.9950 |
| EGULF | 2.0000 | 2.4710 | 1.0693 |
| CGULF | 2.0000 | 2.4875 | 1.0890 |
| WGULF | 2.0242 | 2.5004 | 1.0680 |
| SGULF | 2.0000 | 2.5967 | 1.1897 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: AR and pop and LOCUS
## Friedman chi-squared = 1928.9, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparison of allelic richness among estuaries using Wilcoxon signed rank test (p-value).
Evenness is calculated as the ratio of the number of abundant genotypes to the number of rarer genotypes calculated using the ratio of Stoddart & Tayolor index (diversity index weighted for more abundant alleles) & Shannon-Wiener index (diversity index weighted for more rare alleles).
Distribution of evenness of allelic diversity per estuary
Median, mean +/- std evenness of allele frequencies by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.6119 | 0.6328 | 0.1820 |
| FLGS | 0.5696 | 0.6029 | 0.1736 |
| FLGN | 0.5769 | 0.6083 | 0.1724 |
| MB | 0.5669 | 0.6005 | 0.1721 |
| MISS | 0.5727 | 0.6062 | 0.1710 |
| CS | 0.6119 | 0.6343 | 0.1686 |
| LA | 0.5667 | 0.6004 | 0.1724 |
| CC | 0.5650 | 0.5976 | 0.1711 |
| CAMP | 0.5650 | 0.6042 | 0.1628 |
Median, mean +/- std evenness of allele frequencies by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.6119 | 0.6328 | 0.1820 |
| EGULF | 0.5732 | 0.6056 | 0.1730 |
| CGULF | 0.5783 | 0.6099 | 0.1716 |
| WGULF | 0.5650 | 0.5976 | 0.1711 |
| SGULF | 0.5650 | 0.6042 | 0.1628 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: EVENNESS and GRP and LOCUS
## Friedman chi-squared = 225.98, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparisons of evenness of allelic diversity between estuaries (p-values printed).
Calculate measures of genetic diversity based on allele frequencies.
# define groups to analyze ====
setPop(gen_o) <- ~OCEAN
gen_oce <- seppop(gen_o)
setPop(gen_o) <- ~REGION
gen_reg <- seppop(gen_o)
setPop(gen_o) <- ~POP
gen_pop <- seppop(gen_o)
gen_grp <- c(gen_oce, gen_reg, gen_pop)
gen_grp[["ALL"]] <- gen_o
# calculate diversity stats ====
loc_stats <- list()
for (p in names(gen_grp)) {
locA <- locus_table(gen_grp[[p]], index = "shannon") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
locB <- locus_table(gen_grp[[p]], index = "simpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
temp <- left_join(locA, locB)
locC <- locus_table(gen_grp[[p]], index = "invsimpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
loc_stats[[p]] <- left_join(temp, locC)
}
loc_stats <- ldply(loc_stats, data.frame) %>%
select(-Hexp) %>%
rename(GRP = `.id`,
SIMPSON_IDX = `X1.D`,
N_ALLELES = allele,
SHANNON_IDX = H,
STODD_TAYLOR_IDX = G,
EVENNESS = Evenness)
# calculate genetic diversity stats (heterozygosity-based) ====
loc_stats_2 <- list()
for (p in names(gen_grp)) {
dat <- genind2hierfstat(gen_grp[[p]])
stats <- basic.stats(dat)
loc_stats_2[[p]] <- stats$perloc %>%
rownames_to_column("LOCUS")
}
# combine into single data frame ====
loc_stats_2 <- ldply(loc_stats_2, data.frame) %>%
rename(GRP = `.id`)
loc_stats <- left_join(loc_stats, loc_stats_2) %>%
select(GRP, LOCUS, N_ALLELES, EVENNESS, Ho, Hs, Ht, Fis, SHANNON_IDX, SIMPSON_IDX, STODD_TAYLOR_IDX)
loc_stats[is.na(loc_stats)] <- NA
write_delim(loc_stats, "results/gendiv.o.locstats", delim = "\t")
Hs calculated as within population gene diversity (expected heterozygosity) according to (Nei 1987Nei, Masatoshi. 1987. Molecular Evolutionary Genetics. https://doi.org/10.7312/nei-92038.).
Distribution of expected heterozygosity per estuary
Median, mean +/- std gene diversity by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.4010 | 0.3486 | 0.2014 |
| FLGS | 0.2166 | 0.2519 | 0.1908 |
| FLGN | 0.2419 | 0.2554 | 0.1869 |
| MB | 0.1674 | 0.2128 | 0.1790 |
| MISS | 0.1165 | 0.1975 | 0.1918 |
| CS | 0.1630 | 0.2175 | 0.1874 |
| LA | 0.1441 | 0.2106 | 0.1875 |
| CC | 0.1995 | 0.2182 | 0.1867 |
| CAMP | 0.2821 | 0.2742 | 0.1973 |
Median, mean +/- std gene diversity by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.4010 | 0.3486 | 0.2014 |
| EGULF | 0.2397 | 0.2536 | 0.1879 |
| CGULF | 0.1482 | 0.2096 | 0.1852 |
| WGULF | 0.1995 | 0.2182 | 0.1867 |
| SGULF | 0.2821 | 0.2742 | 0.1973 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: Hs and GRP and LOCUS
## Friedman chi-squared = 34.905, df = 8, p-value = 0.00002782
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results pairwise comparisons using Wilcoxon signed rank test (p-value printed).
Calculate allelic richness corrected for sample size using rarefaction.
Compare levels rarefied allele counts among estuaries.
Distribution of allelic richness measured as rarefied allele counts per estuary
Median, mean +/- std rarefied allelic richness by sample location.
| pop | median | mean | std |
|---|---|---|---|
| FLA | 2.0000 | 2.1606 | 0.7147 |
| FLGS | 1.9986 | 2.0739 | 0.6659 |
| FLGN | 1.9996 | 2.1229 | 0.6764 |
| MB | 2.0000 | 2.0027 | 0.6603 |
| MISS | 1.9616 | 1.9378 | 0.7307 |
| CS | 2.0000 | 2.0529 | 0.7399 |
| LA | 1.9860 | 1.9651 | 0.6567 |
| CC | 1.9993 | 1.9921 | 0.6841 |
| CAMP | 2.0000 | 2.1390 | 0.7786 |
Median, mean +/- std rarefied alllelic richness by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 2.0000 | 2.1606 | 0.7147 |
| EGULF | 1.9992 | 2.0984 | 0.6682 |
| CGULF | 1.9995 | 1.9896 | 0.6940 |
| WGULF | 1.9993 | 1.9921 | 0.6841 |
| SGULF | 2.0000 | 2.1390 | 0.7786 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: AR and pop and LOCUS
## Friedman chi-squared = 35.557, df = 8, p-value = 0.00002117
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparison of allelic richness among estuaries using Wilcoxon signed rank test (p-value).
Evenness is calculated as the ratio of the number of abundant genotypes to the number of rarer genotypes calculated using the ratio of Stoddart & Tayolor index (diversity index weighted for more abundant alleles) & Shannon-Wiener index (diversity index weighted for more rare alleles).
Distribution of evenness of allelic diversity per estuary
Median, mean +/- std evenness of allele frequencies by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.8036 | 0.7607 | 0.1837 |
| FLGS | 0.6088 | 0.6464 | 0.1803 |
| FLGN | 0.6334 | 0.6334 | 0.1714 |
| MB | 0.5393 | 0.5923 | 0.1873 |
| MISS | 0.5174 | 0.5997 | 0.2006 |
| CS | 0.5802 | 0.6293 | 0.1993 |
| LA | 0.5428 | 0.6163 | 0.1989 |
| CC | 0.5789 | 0.6188 | 0.1918 |
| CAMP | 0.6764 | 0.6764 | 0.1644 |
Median, mean +/- std evenness of allele frequencies by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.8036 | 0.7607 | 0.1837 |
| EGULF | 0.6311 | 0.6398 | 0.1750 |
| CGULF | 0.5431 | 0.6092 | 0.1952 |
| WGULF | 0.5789 | 0.6188 | 0.1918 |
| SGULF | 0.6764 | 0.6764 | 0.1644 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: EVENNESS and GRP and LOCUS
## Friedman chi-squared = 24.907, df = 8, p-value = 0.001612
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparisons of evenness of allelic diversity between estuaries (p-values printed).
Calculate measures of genetic diversity based on allele frequencies.
# define groups to analyze ====
setPop(gen_nn) <- ~OCEAN
gen_oce <- seppop(gen_nn)
setPop(gen_nn) <- ~REGION
gen_reg <- seppop(gen_nn)
setPop(gen_nn) <- ~POP
gen_pop <- seppop(gen_nn)
gen_grp <- c(gen_oce, gen_reg, gen_pop)
gen_grp[["ALL"]] <- gen_nn
# calculate diversity stats ====
loc_stats <- list()
for (p in names(gen_grp)) {
locA <- locus_table(gen_grp[[p]], index = "shannon") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
locB <- locus_table(gen_grp[[p]], index = "simpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
temp <- left_join(locA, locB)
locC <- locus_table(gen_grp[[p]], index = "invsimpson") %>%
as.data.frame() %>%
rownames_to_column("LOCUS")
loc_stats[[p]] <- left_join(temp, locC)
}
loc_stats <- ldply(loc_stats, data.frame) %>%
select(-Hexp) %>%
rename(GRP = `.id`,
SIMPSON_IDX = `X1.D`,
N_ALLELES = allele,
SHANNON_IDX = H,
STODD_TAYLOR_IDX = G,
EVENNESS = Evenness)
# calculate genetic diversity stats (heterozygosity-based) ====
loc_stats_2 <- list()
for (p in names(gen_grp)) {
dat <- genind2hierfstat(gen_grp[[p]])
stats <- basic.stats(dat)
loc_stats_2[[p]] <- stats$perloc %>%
rownames_to_column("LOCUS")
}
# combine into single data frame ====
loc_stats_2 <- ldply(loc_stats_2, data.frame) %>%
rename(GRP = `.id`)
loc_stats <- left_join(loc_stats, loc_stats_2) %>%
select(GRP, LOCUS, N_ALLELES, EVENNESS, Ho, Hs, Ht, Fis, SHANNON_IDX, SIMPSON_IDX, STODD_TAYLOR_IDX)
loc_stats[is.na(loc_stats)] <- NA
write_delim(loc_stats, "results/gendiv.n.locstats", delim = "\t")
Hs calculated as within population gene diversity (expected heterozygosity) according to (Nei 1987Nei, Masatoshi. 1987. Molecular Evolutionary Genetics. https://doi.org/10.7312/nei-92038.).
Distribution of expected heterozygosity per estuary
Median, mean +/- std gene diversity by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.2404 | 0.2686 | 0.2176 |
| FLGS | 0.2508 | 0.2811 | 0.2105 |
| FLGN | 0.2530 | 0.2823 | 0.2108 |
| MB | 0.2545 | 0.2835 | 0.2075 |
| MISS | 0.2504 | 0.2823 | 0.2097 |
| CS | 0.2605 | 0.2830 | 0.2165 |
| LA | 0.2522 | 0.2824 | 0.2089 |
| CC | 0.2518 | 0.2822 | 0.2071 |
| CAMP | 0.2485 | 0.2836 | 0.2093 |
Median, mean +/- std gene diversity by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.2404 | 0.2686 | 0.2176 |
| EGULF | 0.2524 | 0.2817 | 0.2106 |
| CGULF | 0.2538 | 0.2828 | 0.2107 |
| WGULF | 0.2518 | 0.2822 | 0.2071 |
| SGULF | 0.2485 | 0.2836 | 0.2093 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: Hs and GRP and LOCUS
## Friedman chi-squared = 324.44, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results pairwise comparisons using Wilcoxon signed rank test (p-value printed).
Calculate allelic richness corrected for sample size using rarefaction.
Compare levels rarefied allele counts among estuaries.
Distribution of allelic richness measured as rarefied allele counts per estuary
Median, mean +/- std rarefied allelic richness by sample location.
| pop | median | mean | std |
|---|---|---|---|
| FLA | 1.9999 | 2.2735 | 0.9896 |
| FLGS | 2.0000 | 2.4611 | 1.0615 |
| FLGN | 2.0000 | 2.4683 | 1.0620 |
| MB | 2.0000 | 2.4918 | 1.0633 |
| MISS | 2.0000 | 2.4890 | 1.0774 |
| CS | 2.0000 | 2.4688 | 1.1167 |
| LA | 2.0000 | 2.4871 | 1.0687 |
| CC | 2.0000 | 2.4960 | 1.0614 |
| CAMP | 2.0000 | 2.5859 | 1.1789 |
Median, mean +/- std rarefied alllelic richness by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 1.9999 | 2.2735 | 0.9896 |
| EGULF | 2.0000 | 2.4647 | 1.0617 |
| CGULF | 2.0000 | 2.4842 | 1.0817 |
| WGULF | 2.0000 | 2.4960 | 1.0614 |
| SGULF | 2.0000 | 2.5859 | 1.1789 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: AR and pop and LOCUS
## Friedman chi-squared = 1942.1, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparison of allelic richness among estuaries using Wilcoxon signed rank test (p-value).
Evenness is calculated as the ratio of the number of abundant genotypes to the number of rarer genotypes calculated using the ratio of Stoddart & Tayolor index (diversity index weighted for more abundant alleles) & Shannon-Wiener index (diversity index weighted for more rare alleles).
Distribution of evenness of allelic diversity per estuary
Median, mean +/- std evenness of allele frequencies by sample location.
| GRP | median | mean | std |
|---|---|---|---|
| FLA | 0.6071 | 0.6304 | 0.1815 |
| FLGS | 0.5681 | 0.6022 | 0.1737 |
| FLGN | 0.5755 | 0.6076 | 0.1726 |
| MB | 0.5663 | 0.6003 | 0.1720 |
| MISS | 0.5727 | 0.6062 | 0.1708 |
| CS | 0.6119 | 0.6342 | 0.1684 |
| LA | 0.5662 | 0.6001 | 0.1723 |
| CC | 0.5650 | 0.5973 | 0.1709 |
| CAMP | 0.5650 | 0.6031 | 0.1629 |
Median, mean +/- std evenness of allele frequencies by region.
| REGION | median | mean | std |
|---|---|---|---|
| SWATL | 0.6071 | 0.6304 | 0.1815 |
| EGULF | 0.5714 | 0.6049 | 0.1732 |
| CGULF | 0.5783 | 0.6097 | 0.1715 |
| WGULF | 0.5650 | 0.5973 | 0.1709 |
| SGULF | 0.5650 | 0.6031 | 0.1629 |
Test for significant differences among estuaries using Friedman’s test.
##
## Friedman rank sum test
##
## data: EVENNESS and GRP and LOCUS
## Friedman chi-squared = 222, df = 8, p-value < 0.00000000000000022
Test for significant pairwise differences between estuaries using Wilcoxon signed rank test.
Results of pairwise comparisons of evenness of allelic diversity between estuaries (p-values printed).